Some housekeeping (again), installing necessary packages.

list.of.packages <- c("igraph", "tidygraph", "ggraph")

new.packages <- list.of.packages[!(list.of.packages %in% installed.packages()[,"Package"])]
if(length(new.packages)) install.packages(new.packages)
rm(list.of.packages, new.packages)

Introduction

Welcome to Module 2. This script is available as R-notebook (pretty) as well as executable jupyter notebook on kaggle here. HAve fun!

Networks… So what?

So, before we talk about networks, one thing upfront… why should we? I mean, they undeniably look pretty, don’t they?

Somehow, the visualization of networks fascinates the human mind (find a short TED talk on networks and how they depict our world here), and has even inspired an own art movement, networkism (see some examples here).

Yet, besides that, is there an analytical value for a data scientist to bother about networks?

Networks in R

There are a number of applications designed for network analysis and the creation of network graphs such as gephi and cytoscape. Though not specifically designed for it, R has developed into a powerful tool for network analysis.

Significant network analysis packages for R include the network, sna, and igraph package. In addition, Thomas Lin Pedersen has recently released the tidygraph package that leverage the power of igraph in a manner consistent with the tidyverse workflow. Even better, he tops it up with ggraph, a consistent ´ggplot2´-look-and-feel network visualization package.

R can also be used to make interactive network graphs with the htmlwidgets framework that translates R code to JavaScript. Cool implementations thereof are the vizNetwork and networkD3 packages.

As analytical tool, I will in this lab mostly use igraph. In terms of functions, it is pretty much equivalent to network, yet slightly more powerful, better integrated, and maintained. Since both packages have many of the same functions, better don’t load them both at once.

The Basic Structure of Networks

The Basic Jargon

First of all, what is a network? Plainly speaking, a network is a system of elements which are connected by some relationship. The vocabulary can be a bit technical and even inconsistent between different disciplines, packages, and software. The whole system is (surprise, surprise) usually called a network or graph. The elements are commonly referred to as nodes (system theory jargon) or vertices (graph theory jargon) of a graph, while the connections are edges or links. I will mostly refer to the elements as nodes, and their connections as edges.

Generally, networks are a form of representing relational data. This is a very general tool that can be applied to many different types of relationships between all kind of elements. The content, meaning, and interpretation for sure depends on what elements we display, and which types of relationships. For example:

  • In Social Network Analysis:
    • Nodes represent actors (which can be persons, firms and other socially constructed entities)
    • Edges represent relationships between this actors (friendship, interaction, co-affiliation, similarity ect.)
  • Other types of network
    • Chemistry: Interaction between molecules
    • Computer Science: The wirld-wide-web, inter- and intranet topologies
    • Biology: Food-web, ant-hives

The possibilities to depict relational data are manifold. For example:

  • Relations among persons
    • Kinship: mother of, wife of…
    • Other role based: boss of, supervisor of…
    • Cognitive/perceptual: knows, aware of what they know…
    • Affective: likes, trusts…
    • Interaction: give advice, talks to…
    • Affiliation: belong to same clubs, shares same interests…
  • Relations among organizations
    • As corporate entities
    • Buy from / sell to, leases to, outsources to
    • Owns shares of, subsidiary of
    • Joint ventures, strategic alliances
    • Via their members
      • Personnel flows
      • Interlocking directorates
      • Personal friendships
      • Co-memberships
  • Relations other (non-social) entities
    • Patents
      • Patents citing other patents
      • Co-occurrence of technological classes *Research fields
      • through citations
      • through people co-affiliated with fields *Sectors
      • input-output relations
      • Labor mobility *Technologies
      • Patent IPC classes
      • Semantic co-occurrence

Note: Content matters! Each relation yields a different structure & has different effects. Theories might make sense on inter-personal, but not inter-organizational or non-social context.

The Data-Structure of Relational Data

Edgelist

MOst real world relational data is to be found in what we call an edge list, a dataframe that contains a minimum of two columns, one column of nodes that are the source of a connection and another column of nodes that are the target of the connection. The nodes in the data are identified by unique IDs. If the distinction between source and target is meaningful, the network is directed. If the distinction is not meaningful, the network is undirected (more on that later). So, every row that contains the ID of one element in column 1, and the ID of another element in column 2 indicates that a connection between them exists. An edge list can also contain additional columns that describe attributes of the edges such as a magnitude aspect for an edge. If the edges have a magnitude attribute the graph is considered weighted (more on that later). Below an example ofa minimal edge list created with the tibble() function.

edge_list <- tibble(from = c(1, 2, 2, 3, 4), to = c(2, 3, 4, 2, 1))
edge_list

Sometimes it is preferable to also create a separate node list. At its simplest, a node list is a data frame with a single column - which I will label as “id” - that lists the node IDs found in the edge list. The advantage of creating a separate node list is the ability to add attribute columns to the data frame such as the names of the nodes or any kind of groupings.

node_list <- tibble(id = 1:4, group = sample(letters[1:2], 4, replace = TRUE))
node_list

Adjacency Matrix

A second popular form of network representation is the adjacency-matrix (also called socio-matrix). It is represented as a \(n*n\) matrix, where \(n\) stands for the number of elements of which their relationships should be represented. The value in the cell that intercepts row \(n\) and column \(m\) indicates if an edge is present (=1) or absent (=0).

Tip: Given an edgelist, an adjacency matrix can easily be produced by crosstabulating:

adj_matrix <- table(edge_list) %>% as.matrix()
adj_matrix
##     to
## from 1 2 3 4
##    1 0 1 0 0
##    2 0 0 1 1
##    3 0 1 0 0
##    4 1 0 0 0

Generating a Graph Object in igraph

To create an igraph object from an edge-list data frame we can use the graph_from_data_frame() function, which is a bit more straight forward than network(). There are three arguments in the graph_from_data_frame() function: d, vertices, and directed. Here, d refers to the edge list, vertices to the node list, and directed can be either TRUE or FALSE depending on whether the data is directed or undirected. By default, graph.data.frame() treats the first two columns of the edge list and any remaining columns as edge attributes.

library(igraph)
g <- graph_from_data_frame(d = edge_list, vertices = node_list, directed = FALSE)
g
## IGRAPH fd9f11e UN-- 4 5 -- 
## + attr: name (v/c), group (v/c)
## + edges from fd9f11e (vertex names):
## [1] 1--2 2--3 2--4 2--3 1--4

Lets inspect the resulting object. An igraph graph object summary reveals some interesting informations.

  • First, it tells us the graph-type: undirected UN, or directed DN
  • Afterwards, the number of nodes (4), and edges (6)
  • Followed by the node attributes (node level variables), which in this case are only their name (attr: name (v/c))
  • Lastly, a list of all existing edges. Note: n--m indicates an undirected, n->m an directed edge.

Lets take a look at the structure of the object:

glimpse(g[[1]])
## List of 1
##  $ 1: 'igraph.vs' Named int [1:2] 2 4
##   ..- attr(*, "names")= chr [1:2] "2" "4"
##   ..- attr(*, "env")=<weakref> 
##   ..- attr(*, "graph")= chr "fd9f11e3-c69f-11e8-9859-73db0ee6b157"

We see, the object has a list-format, consisting of sepperate lists for every node, containing some attributes which are irrelevant now, and an edgelist for every node, capturing its ego-network (eg., .. ..- attr(*, "names")= chr [1:2] "2" "4")

We can also plot it to take a look. igraph object can be directly used with the plot() function. The results can be adjusted with a set of parameters we will discover later. It’s not super pretty, therefore we will later also explore more powerfull plotting tools for rgaphs. However, its quick&dirty, so lets take it like that for now.

plot(g)

Yeah, that’s the graph. We We can also use the adjacency matrix to create the same graph.

g <- graph_from_adjacency_matrix(adj_matrix, mode = "undirected")
g
## IGRAPH fdc43a6 UN-- 4 4 -- 
## + attr: name (v/c)
## + edges from fdc43a6 (vertex names):
## [1] 1--2 1--4 2--3 2--4

We can inspect and manipulate the nodes via V(g) (V for vertices, its graph-theory slang), and edges with E(g)

V(g)
## + 4/4 vertices, named, from fdc43a6:
## [1] 1 2 3 4
E(g)
## + 4/4 edges from fdc43a6 (vertex names):
## [1] 1--2 1--4 2--3 2--4

We can also use most of the base-R slicing&dicing.

V(g)[1:3]
## + 3/4 vertices, named, from fdc43a6:
## [1] 1 2 3
E(g)[2:4]
## + 3/4 edges from fdc43a6 (vertex names):
## [1] 1--4 2--3 2--4

Remember, it’s a list-object. So, if we just want to have the values, we have to use the double bracket [[x]].

V(g)[[1:3]]
## + 3/4 vertices, named, from fdc43a6:
##   name
## 1    1
## 2    2
## 3    3

We can also use the $ notation.

V(g)$name
## [1] "1" "2" "3" "4"

Networks are coming…

So, lets get serious. Appropriate for the weather these days in Denmark, the theme is “winter is comming…”. Therefore, we will have some fun analysing the Game of Thrones data provided by Andrew Beveridge. It is a Character Interaction Networks for George R. R. Martin’s “A Song of Ice and Fire” saga (yes, we are talking about the books…). These networks were created by connecting two characters whenever their names (or nicknames) appeared within 15 words of one another in one of the books in “A Song of Ice and Fire.” The edge weight corresponds to the number of interactions. THis is a nice skill you will have afeter the second part of M2 on your own.

Build the graph

edges.cooc.all <- fread("../input/GoT/asoiaf-all-edges.csv", data.table = FALSE) 
nodes.cooc.all <- fread("../input/GoT/asoiaf-all-nodes.csv", data.table = FALSE) 
head(edges.cooc.all)

So, that’s what we have, a classical edgelist, with id1 in column 1 and id2 in column2. Note, the edges are in this case weighted.I don’t like the sepperating “-” between in the names, lets get rid of them.

colnames(edges.cooc.all) <- tolower(colnames(edges.cooc.all))
edges.cooc.all %<>%
  mutate(source = gsub("-", " ", source),
         target = gsub("-", " ", target)) 

Ok, lets see how many characters we have overal.

edges.cooc.all %>%
  select(-type) %>%
  gather(x, name, source:target) %>% 
  n_distinct(.$name)
## [1] 5646

That’s a lot. We might want to restrict ourselfes first to a small set of the 50 most connected nodes.

chars.main <- edges.cooc.all %>%
  select(-type) %>%
  gather(x, name, source:target) %>%
  group_by(name) %>%
  summarise(sum_weight = sum(weight)) %>%
  ungroup() %>%
  arrange(desc(sum_weight)) %>%
  top_n(50)

head(chars.main)

So far so good, if we only go by edge weights, Seems as if Tyrion might make it…. my favorite anyhow. Lets reduce our edgelist to this main characters, just to warm up and keep the overview.

edges.cooc <- edges.cooc.all %>%
  filter(source %in% chars.main$name & target %in% chars.main$name) %>%
  select(source, target, weight)
g <- graph_from_data_frame(d = edges.cooc, directed = FALSE)
g
## IGRAPH fe2183d UNW- 50 402 -- 
## + attr: name (v/c), weight (e/n)
## + edges from fe2183d (vertex names):
##  [1] Aemon Targaryen (Maester Aemon)--Grenn            
##  [2] Aemon Targaryen (Maester Aemon)--Jeor Mormont     
##  [3] Aemon Targaryen (Maester Aemon)--Jon Snow         
##  [4] Aemon Targaryen (Maester Aemon)--Mance Rayder     
##  [5] Aemon Targaryen (Maester Aemon)--Robert Baratheon 
##  [6] Aemon Targaryen (Maester Aemon)--Samwell Tarly    
##  [7] Aemon Targaryen (Maester Aemon)--Stannis Baratheon
##  [8] Arya Stark                     --Bran Stark       
## + ... omitted several edges

Note that this co-occurence network is weighted (number of co-occurence), and undirected.

is_weighted(g)
## [1] TRUE
is_directed(g)
## [1] FALSE

Inspect the graph

Overal graph attributes

We already know from the summary, but we can also count the number of nodes and edges as follows:

# Count number of edges
gsize(g)
## [1] 402
# Count number of vertices
gorder(g)
## [1] 50

We can give the graph a first plot to see what happens there. It’s not pretty, but we will fine-tune it later.

plot(g,
     edge.width = (E(g)$weight / max(E(g)$weight)) * 5  # Only argument we use so far for the weighted edges
     )

IS really not pretty. But we will work on that later. However, we already see that some nodes are not connected (isolated), so lets drop them for our network analysis. Lets also get rid of edges with super small weight.

g <- delete_edges(g, E(g)[weight < 20]) # How to delete edges in a graph pbject
g <- delete_vertices(g, degree(g) == 0) # How to delete nodes in a graph pbject

Edges

Lets see a bit which options we have to select edges. We could for example use inc() to select only edges that include a certain node.

Note: Almost all edge and many furtehr graph functions to follow, have an argument called mode. Here, it doesn’t matter, since we ahve an undirected network. In case the edges are directed, we want to specify which mode to consider (mode = c("all", "out", "in", "total")). More on that at a later point.

E(g)[[inc("Daenerys Targaryen")]]  
## + 6/189 edges from fe64c76 (vertex names):
##                  tail               head tid hid weight
## 13    Barristan Selmy Daenerys Targaryen   3   9     83
## 68 Daenerys Targaryen              Drogo   9  11    123
## 69 Daenerys Targaryen   Hizdahr zo Loraq   9  16     96
## 70 Daenerys Targaryen      Jorah Mormont   9  24    161
## 71 Daenerys Targaryen    Quentyn Martell   9  49     58
## 72 Daenerys Targaryen   Robert Baratheon   9  39     26

We could also select them due to their characteristics

E(g)[[weight >= 150]]
## + 11/189 edges from fe64c76 (vertex names):
##                   tail              head tid hid weight
## 12          Arya Stark       Sansa Stark   2  43    155
## 20          Bran Stark             Hodor   4  17    209
## 26          Bran Stark        Robb Stark   4  38    169
## 55    Cersei Lannister Joffrey Baratheon   8  21    173
## 65    Cersei Lannister  Tyrion Lannister   8  47    209
## 70  Daenerys Targaryen     Jorah Mormont   9  24    161
## 83        Eddard Stark  Robert Baratheon  12  39    334
## 112       Jeor Mormont          Jon Snow  20  23    174
## 123  Joffrey Baratheon       Sansa Stark  21  43    222
## 126  Joffrey Baratheon  Tyrion Lannister  21  47    219
## 134           Jon Snow     Samwell Tarly  23  41    228

Or plot their weight distribution

hist(E(g)$weight)

There is muchg more, but lets leave it with that for now.

Nodes

The node level gives us the opportunity to select (in this case) certain characters, their connectivity characteristics, and their enviroment.

First we look at different measures of degree-centrality, which are a set of measures incicating a certain type of structural importance in the network.

Degree centrality

The first one is the degree centralities is the degree centrality, which plainly means how well connected the node is (= Number of Edges). The version for the weighted networks (= Sum of edgeweigths) can be accessed with strength(). We can also directly check which node has the maximum degree.

degree(g) 
## Aemon Targaryen (Maester Aemon)                      Arya Stark 
##                               3                               9 
##                 Barristan Selmy                      Bran Stark 
##                               5                              14 
##                Brienne of Tarth                           Bronn 
##                               4                               2 
##                   Catelyn Stark                Cersei Lannister 
##                              16                              20 
##              Daenerys Targaryen                  Davos Seaworth 
##                               6                               2 
##                           Drogo                    Eddard Stark 
##                               2                              18 
##                    Edmure Tully                  Gregor Clegane 
##                               3                               4 
##                           Grenn                Hizdahr zo Loraq 
##                               2                               2 
##                           Hodor                      Ilyn Payne 
##                               4                               4 
##                 Jaime Lannister                    Jeor Mormont 
##                              16                               4 
##               Joffrey Baratheon                      Jojen Reed 
##                              19                               3 
##                        Jon Snow                   Jorah Mormont 
##                              13                               3 
##                    Loras Tyrell                           Luwin 
##                               6                               6 
##                      Lysa Arryn                    Mance Rayder 
##                               4                               1 
##                 Margaery Tyrell                      Meera Reed 
##                               5                               3 
##                      Melisandre                     Meryn Trant 
##                               3                               3 
##              Myrcella Baratheon                   Petyr Baelish 
##                               3                              10 
##                         Pycelle                 Renly Baratheon 
##                               6                              10 
##                    Rickon Stark                      Robb Stark 
##                               6                              17 
##                Robert Baratheon                   Rodrik Cassel 
##                              16                               5 
##                   Samwell Tarly                  Sandor Clegane 
##                               5                               6 
##                     Sansa Stark               Stannis Baratheon 
##                              17                              13 
##                   Theon Greyjoy                Tommen Baratheon 
##                               6                               6 
##                Tyrion Lannister                 Tywin Lannister 
##                              25                               9 
##                 Quentyn Martell                           Varys 
##                               2                               7
which.max(degree(g))
## Tyrion Lannister 
##               47
strength(g)
## Aemon Targaryen (Maester Aemon)                      Arya Stark 
##                             234                             512 
##                 Barristan Selmy                      Bran Stark 
##                             226                            1206 
##                Brienne of Tarth                           Bronn 
##                             225                             160 
##                   Catelyn Stark                Cersei Lannister 
##                             766                            1438 
##              Daenerys Targaryen                  Davos Seaworth 
##                             547                             195 
##                           Drogo                    Eddard Stark 
##                             151                            1175 
##                    Edmure Tully                  Gregor Clegane 
##                             128                             102 
##                           Grenn                Hizdahr zo Loraq 
##                             121                             138 
##                           Hodor                      Ilyn Payne 
##                             333                             112 
##                 Jaime Lannister                    Jeor Mormont 
##                             862                             277 
##               Joffrey Baratheon                      Jojen Reed 
##                            1343                             223 
##                        Jon Snow                   Jorah Mormont 
##                            1238                             216 
##                    Loras Tyrell                           Luwin 
##                             167                             238 
##                      Lysa Arryn                    Mance Rayder 
##                             179                             112 
##                 Margaery Tyrell                      Meera Reed 
##                             258                             255 
##                      Melisandre                     Meryn Trant 
##                             208                             102 
##              Myrcella Baratheon                   Petyr Baelish 
##                              87                             477 
##                         Pycelle                 Renly Baratheon 
##                             193                             448 
##                    Rickon Stark                      Robb Stark 
##                             263                             966 
##                Robert Baratheon                   Rodrik Cassel 
##                            1091                             137 
##                   Samwell Tarly                  Sandor Clegane 
##                             472                             259 
##                     Sansa Stark               Stannis Baratheon 
##                            1059                             771 
##                   Theon Greyjoy                Tommen Baratheon 
##                             239                             337 
##                Tyrion Lannister                 Tywin Lannister 
##                            1694                             392 
##                 Quentyn Martell                           Varys 
##                              80                             400
which.max(strength(g))
## Tyrion Lannister 
##               47

Well, again Tyrion it is…

Eigenvector Centrality

Again, the eigenvector. Eigenvector centrality is a very interesting measure. The idea is that nodes are the more important when they are connected to more important nodes. So, if you have cool friends, you are also supposed to be somewhat cooler. You already see, the calculation of such values in a network becomes circular, therefore we need some mathematical magic, which the eigenvector does. We will dig into this concept deeper at another point. We scale it, since these values otherwise tend to become very large.

Note: Here, also a weighted version is available. Again, the ? is your friend.

eigen_centrality(g, scale = TRUE)$vector %>% round(3)
## Aemon Targaryen (Maester Aemon)                      Arya Stark 
##                           0.064                           0.332 
##                 Barristan Selmy                      Bran Stark 
##                           0.073                           0.297 
##                Brienne of Tarth                           Bronn 
##                           0.126                           0.183 
##                   Catelyn Stark                Cersei Lannister 
##                           0.415                           0.952 
##              Daenerys Targaryen                  Davos Seaworth 
##                           0.039                           0.060 
##                           Drogo                    Eddard Stark 
##                           0.007                           0.754 
##                    Edmure Tully                  Gregor Clegane 
##                           0.067                           0.072 
##                           Grenn                Hizdahr zo Loraq 
##                           0.035                           0.008 
##                           Hodor                      Ilyn Payne 
##                           0.079                           0.086 
##                 Jaime Lannister                    Jeor Mormont 
##                           0.547                           0.113 
##               Joffrey Baratheon                      Jojen Reed 
##                           0.922                           0.046 
##                        Jon Snow                   Jorah Mormont 
##                           0.364                           0.038 
##                    Loras Tyrell                           Luwin 
##                           0.112                           0.061 
##                      Lysa Arryn                    Mance Rayder 
##                           0.121                           0.047 
##                 Margaery Tyrell                      Meera Reed 
##                           0.200                           0.051 
##                      Melisandre                     Meryn Trant 
##                           0.065                           0.089 
##              Myrcella Baratheon                   Petyr Baelish 
##                           0.067                           0.353 
##                         Pycelle                 Renly Baratheon 
##                           0.156                           0.267 
##                    Rickon Stark                      Robb Stark 
##                           0.098                           0.428 
##                Robert Baratheon                   Rodrik Cassel 
##                           0.755                           0.044 
##                   Samwell Tarly                  Sandor Clegane 
##                           0.127                           0.186 
##                     Sansa Stark               Stannis Baratheon 
##                           0.723                           0.367 
##                   Theon Greyjoy                Tommen Baratheon 
##                           0.091                           0.270 
##                Tyrion Lannister                 Tywin Lannister 
##                           1.000                           0.333 
##                 Quentyn Martell                           Varys 
##                           0.004                           0.345

Betweenness Centrality

Lastly, the betweenness centrality. That is also an very interesting concept. This one is a measure of how much a node connects otherwise unconnected nodes with each others. This is measured by the amount of shortest paths leading through the node.

betweenness(g)
## Aemon Targaryen (Maester Aemon)                      Arya Stark 
##                             0.0                            19.0 
##                 Barristan Selmy                      Bran Stark 
##                            93.0                            17.5 
##                Brienne of Tarth                           Bronn 
##                             0.0                             0.0 
##                   Catelyn Stark                Cersei Lannister 
##                            94.0                            96.5 
##              Daenerys Targaryen                  Davos Seaworth 
##                             0.0                             0.0 
##                           Drogo                    Eddard Stark 
##                             0.0                           138.0 
##                    Edmure Tully                  Gregor Clegane 
##                             0.0                            15.0 
##                           Grenn                Hizdahr zo Loraq 
##                             0.0                             0.0 
##                           Hodor                      Ilyn Payne 
##                            85.0                             8.5 
##                 Jaime Lannister                    Jeor Mormont 
##                            67.5                            57.0 
##               Joffrey Baratheon                      Jojen Reed 
##                            63.0                             0.0 
##                        Jon Snow                   Jorah Mormont 
##                           116.0                            47.5 
##                    Loras Tyrell                           Luwin 
##                            57.5                           133.0 
##                      Lysa Arryn                    Mance Rayder 
##                             0.0                             0.0 
##                 Margaery Tyrell                      Meera Reed 
##                             0.0                             0.0 
##                      Melisandre                     Meryn Trant 
##                            19.0                             0.0 
##              Myrcella Baratheon                   Petyr Baelish 
##                             1.0                             7.0 
##                         Pycelle                 Renly Baratheon 
##                            25.0                            31.0 
##                    Rickon Stark                      Robb Stark 
##                            73.0                            98.5 
##                Robert Baratheon                   Rodrik Cassel 
##                           155.5                            25.0 
##                   Samwell Tarly                  Sandor Clegane 
##                            20.0                            13.0 
##                     Sansa Stark               Stannis Baratheon 
##                           137.0                           103.0 
##                   Theon Greyjoy                Tommen Baratheon 
##                            30.0                             0.0 
##                Tyrion Lannister                 Tywin Lannister 
##                           309.0                            24.0 
##                 Quentyn Martell                           Varys 
##                             0.0                             0.0

Neighborhood of a Node

Lastly, we can look at the surrounding of a node, meaning the ones it is connected to, its neighborhood.

neighbors(g, 'Robert Baratheon')
## + 16/50 vertices, named, from fe64c76:
##  [1] Barristan Selmy    Catelyn Stark      Cersei Lannister  
##  [4] Daenerys Targaryen Eddard Stark       Jaime Lannister   
##  [7] Joffrey Baratheon  Jon Snow           Petyr Baelish     
## [10] Pycelle            Renly Baratheon    Sansa Stark       
## [13] Stannis Baratheon  Tyrion Lannister   Tywin Lannister   
## [16] Varys

Likewise, we can look at the ego-network of a node. That means how many nodes are in a certain geodesic distance. Plainly speaking, how many nodes are not more than x-steps away.

ego(g, 2, "Drogo")[[1]]
## + 8/50 vertices, named, from fe64c76:
## [1] Drogo              Daenerys Targaryen Jorah Mormont     
## [4] Barristan Selmy    Hizdahr zo Loraq   Robert Baratheon  
## [7] Quentyn Martell    Tyrion Lannister

We can also not only look at it, but produce a new sub-graph only of this ego-network.

g.drogo <- make_ego_graph(g, 2, nodes = "Drogo")[[1]]
g.danny <- make_ego_graph(g, 2, nodes = "Daenerys Targaryen")[[1]]
plot(g.drogo)

plot(g.danny)

Well, Danny seems to be way more connected than her husband, right?

Btw: To merge two graphs, just do:

g.merge <- g.drogo + g.danny
g.merge
## IGRAPH fee7f58 UN-- 21 82 -- 
## + attr: name (v/c), weight_1 (e/n), weight_2 (e/n)
## + edges from fee7f58 (vertex names):
##  [1] Stannis Baratheon--Tywin Lannister  
##  [2] Renly Baratheon  --Stannis Baratheon
##  [3] Pycelle          --Varys            
##  [4] Petyr Baelish    --Varys            
##  [5] Petyr Baelish    --Sansa Stark      
##  [6] Petyr Baelish    --Pycelle          
##  [7] Jon Snow         --Stannis Baratheon
##  [8] Joffrey Baratheon--Varys            
## + ... omitted several edges

(Global) Network structure

Finally, it is often also informative to look at the overal characteristics of the network. We will do this in more detail enxt time, but just so you know:

The density of a measure represents the share of all connected to all possible connections in the network

edge_density(g)
## [1] 0.1542857

Transistivity, also called the Clustering Cefficient indicates how much the network tends to be locally clustered. That is measured by the share of closed triplets. Again,w e will dig into that next time.

transitivity(g)
## [1] 0.4552807

The diameter is the longest of teh shortest paths between two nodes of the network.

diameter(g, directed = F, weights = NA)
## [1] 4

Finally, the mean distance, or average path lenght represents the mean of all shortest paths between all nodes. It is a measure of diffusion potential within a network.

mean_distance(g, directed = F)
## [1] 2.337959

Your turn

Ahh, you saw it comming, right? What about you explore the GoT network a bit on your own HERE. Lets see how that works out!

Directed Networks are comming…

So far so good, up to now we considered undirected networks, constructed by the amount characters co-occur. However, as you already might guess, that’s not where we stop. SOme general infos on directed networks:

  • Relations do not necessarily have to be symmetric. Rather, often we find a logical or empirical direction in role based, affective and interactive relationships.
  • Eg. is supervisor of…, gives advice to…, is supplier of…, invests in …
  • This direction is often crucial for the correct interpretation
  • Introduces new measurements, eg. the differentiation of degrees into out- (sending a tie) and in-degrees (receiving a tie)

An obvious example at the GoT case are family ties. Here, I will ose the nicely compiled dataset of the wonderful Shirin that can be found here. It contains a nodelist with house-affiliations and furtehr characteristics of main characters, and a edgelist of their family relationships.

Load the graph

rm(chars.main, g, g.danny, g.drogo, g.merge)
edges.fam <- readRDS("../input/GoT/union_edges.RDS")
nodes.fam <- readRDS("../input/GoT/union_characters.RDS")
head(nodes.fam)
head(edges.fam)

Lets construct teh graph. We now also feed in a set of nodes with their characteristics.

g <- graph_from_data_frame(edges.fam, 
                           vertices = nodes.fam,
                           directed = TRUE)
g
## IGRAPH ff1ceaf DN-- 208 404 -- 
## + attr: name (v/c), male (v/n), culture (v/c), house (v/c),
## | popularity (v/n), house2 (v/c), color (v/c), shape (v/c), type
## | (e/c), color (e/c), lty (e/c)
## + edges from ff1ceaf (vertex names):
##  [1] Lysa Arryn       ->Robert Arryn      
##  [2] Jasper Arryn     ->Alys Arryn        
##  [3] Jasper Arryn     ->Jon Arryn         
##  [4] Jon Arryn        ->Robert Arryn      
##  [5] Cersei Lannister ->Tommen Baratheon  
##  [6] Cersei Lannister ->Joffrey Baratheon 
## + ... omitted several edges

Visualizing networks I

How does the plot look?

plot(g, 
     vertex.color = "blue",
    vertex.shape = "circle")

Well, thats some uninformative hairball. Such things often happen when we want to plot large networks. This is a good point to start using the parameters in the igraph plotting function. Next time I will introduce you to my current favorite, ggraph. However, the igraph plotting function is relatively easy to sue. so its worth (if only for the sake of illustration) to tweak through all the parameters to get THE GoT family graph, as pretty as possible.

For plotting the legend, I am summarizing the edge and node colors upfront. You will later see why.

color_vertices <- nodes.fam %>%
  group_by(house, color) %>%
  summarise(n = n()) %>%
  filter(!is.na(color))

colors_edges <- edges.fam %>%
  group_by(type, color) %>%
  summarise(n = n()) %>%
  filter(!is.na(color))

Ok, we are ready, lets do this plot!

plot(g,
     layout = layout_with_fr(g), # graph layout, more on that later
     vertex.label = gsub(" ", "\n", V(g)$name), # Just altered the label that it is in two lines
     vertex.shape = V(g)$shape, # load the predefined shape of the nodes (male/female)
     vertex.color = V(g)$color, # load the predefined color of the nodes (houses)
     vertex.size = (V(g)$popularity + 0.5) * 5, # define node-size by popularity
     vertex.frame.color = "gray", # color of the frame around the node
     vertex.label.color = "black", # color of the label
     vertex.label.cex = 0.8, # size of the label
     edge.arrow.size = 0.5, # size of the arrow
     edge.color = E(g)$color, # laod predefined edge/color (relationship)
     edge.lty = E(g)$lty) # Edge style
legend("topleft", legend = c(NA, "Node color:", as.character(color_vertices$house), NA, "Edge color:", as.character(colors_edges$type)), pch = 10,
       col = c(NA, NA, color_vertices$color, NA, NA, colors_edges$color), pt.cex = 3, cex = 2, bty = "n", ncol = 1,
       title = "") 
legend("topleft", legend = "", cex = 3, bty = "n", ncol = 1,
       title = "Game of Thrones Family Ties")

What a beauty! Thanks to Shirin’s nice upfront work, it was rather easy to plot it that nicely. However, we could also define our colors as we wished. Here the R-Color Palettes might become handy.

Community Detection

You might have already guessed, we can very well also do a clustering exercise in networks. Here, we do not cluster nodes according to their similarity in attributes, but according to their connectivity. There are plenty of algorithms,, and we will explore further ones lateron. Most of them aim to find communities with maximum within/connectivity, and minimum between/connectivity.

However, most of them are not designed to work with directed networks. Therefore, we will convert our nice network for now to an undirected one.

g.ud <- as.undirected(g)

First, we will give it a try with the edge-betweenness algorithm (Newman-Girvan). Here, high-betweenness edges are removed sequentially (recalculating at each step) and the best partitioning of the network is selected.

Lets take a look how it works.

?cluster_edge_betweenness

And we now run it. As an hirarchical community detection technique. Since it is an hirarchical one, we can again plot a dendogram which we already know from the hirarchical clustering

ceb <- cluster_edge_betweenness(g.ud) 
dendPlot(ceb, mode="hclust")

plot(ceb, g.ud,
     vertex.frame.color = V(g)$color, # load the predefined color of the nodes (houses)
     vertex.size = (V(g)$popularity + 0.5) * 5 # define node-size by popularity) 
)

Lets only see how good it performs on the major houses, the rest is too small anyhow

bind_cols(com = ceb$membership, house = V(g.ud)$house) %>%
  group_by(house) %>%
  filter(n() >= 10) %>%
  ungroup() %>%
  table()
##     house
## com  House Baratheon House Frey House Greyjoy House Lannister
##   1                0          0             0               0
##   2                9          0             0               4
##   3                1          0             0               0
##   4                0         13             0               0
##   5                0          7             0               9
##   6                0          0            14               0
##   7                0          0             0              13
##   8                0          0             0               0
##   9                0          0             0               0
##   10               0          0             0               0
##   11               0          0             0               0
##   12               0          0             0               0
##     house
## com  House Martell House Stark House Targaryen House Tyrell
##   1              0           7               0            0
##   2              0           0               0            1
##   3              2           0              13            0
##   4              0           0               0            0
##   5              0           0               0            0
##   6              0           0               0            0
##   7              0           0               0            0
##   8             11           0               0            0
##   9              0           6               0            0
##   10             0          13               0            0
##   11             0           0               0            9
##   12             0           0               0            5

We see, indeed, that the communities for the most part capture the affiliation to the great houses.

Your turn

AGain, its time to have some fun on your own. HERE you will find another kaggle notebook where you can demonstrate your network analysis skills even more!